In these exercises we will see the power of the libraries
ggplot2 and plotly to make sense of
statistical data. The goal is to reproduce the moving chart that you can
see in this video from Hans Rosling – I invite you to watch his other
videos, they are quite enlightening and inspiring:
For this, we will need to gather the data:
The first thing to do is to load and regroup all these datasets into a single one.
Load the tidyverse library and, using
read_csv(), load the 4 datasets in 4 separate tibbles
called children, income, pop and
religion.
To reproduce the chart on the video, we need to determine the
dominant religion in each country. In the religion dataset,
add a column Religion that will give the name of the
dominant religion for each country. For this, you might want to use this
method that returns the name of the column containing the maximum of
each row of a table:
DF <- tibble(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))
DF## # A tibble: 3 Ă— 3
## V1 V2 V3
## <dbl> <dbl> <dbl>
## 1 2 7 9
## 2 8 3 6
## 3 1 5 4
names(DF)[max.col(DF)]## [1] "V3" "V1" "V2"
pivot_longer(), make all datasets tidy.children should now contain 3 columns:
Country, Year and Fertility.income should now contain 3 columns:
Country, Year and Income.pop should now contain 3 columns: Country,
Year and Population.We will only consider data from 1800 to 2018. Example of syntax using
the pipe operator %>%:
DF <- read_table("name 2010 2011 2012 2014
Kevin 10 11 12 123
Jane 122 56 23 4
"
)
DF## # A tibble: 2 Ă— 5
## name `2010` `2011` `2012` `2014`
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Kevin 10 11 12 123
## 2 Jane 122 56 23 4
DF %>%
select(name, '2010':'2012') %>%
pivot_longer(col=-name,
names_to="Year",
values_to="Score",
names_transform=list(Year = as.numeric))## # A tibble: 6 Ă— 3
## name Year Score
## <chr> <dbl> <dbl>
## 1 Kevin 2010 10
## 2 Kevin 2011 11
## 3 Kevin 2012 12
## 4 Jane 2010 122
## 5 Jane 2011 56
## 6 Jane 2012 23
The line names_transform=list(Year = as.numeric) is here
to convert the character year values to numerical values.
dat, containing the columns Country,
Year, Population, Religion,
Fertility and Income. Look into the
inner_join() function of the dplyr library
(which is part of the tidyverse library). For the
religion dataset, we will consider that the proportions of
2010 are representative of all times.religion dataset so that it contains
only two columns, Country and Religion, the
data being filtered from the original religion dataset for
the year 2010.dat.You should end up with a dataset like this one:
## # A tibble: 37,887 Ă— 6
## Country Year Fertility Income Population Religion
## <chr> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 Afghanistan 1800 7 603 3280000 Muslims
## 2 Afghanistan 1801 7 603 3280000 Muslims
## 3 Afghanistan 1802 7 603 3280000 Muslims
## 4 Afghanistan 1803 7 603 3280000 Muslims
## 5 Afghanistan 1804 7 603 3280000 Muslims
## 6 Afghanistan 1805 7 603 3280000 Muslims
## 7 Afghanistan 1806 7 603 3280000 Muslims
## 8 Afghanistan 1807 7 603 3280000 Muslims
## 9 Afghanistan 1808 7 603 3280000 Muslims
## 10 Afghanistan 1809 7 603 3280000 Muslims
## # … with 37,877 more rows
In case you struggled to get there, download the archive with the
button at the top and get the dat tibble with dat <- read_csv("Data/dat.csv").
Now our dataset is ready, let’s plot it.
Load the library ggplot2 and set the global theme to
theme_bw() using theme_set()
Create a subset of dat concerning your origin
country. For me it will be dat_france
Plot the evolution of the income per capita and the number of children per woman as a function of the years, and make it look like that (notice the kinks during the two world wars):
Create a subset of dat containing the data for your
country plus all the neighbor countries (if you come from an island, the
nearest countries…). For me, dat_france_region will contain
data from France, Spain, Italy, Switzerland, Germany, Luxembourg and
Belgium.
Plot again income and fertility as a function of the years, but add a color corresponding to the country and a point size to its population:
plotly and make the previous graphs
interactive. You can make an interactive graph by calling
ggplotly(), like that:library(plotly)
P <- ggplot(data = dat_france, aes(x=Population, y=Income))+
geom_point()
ggplotly(P)# add dynamicTicks=TRUE allows redrawing ticks when zooming inframe = in the chart’s aesthetics. So
now, make the graph of the video ! (you can also add the aesthetics
id=Country to show the country name in the popup when
hovering on a point).